15 research outputs found
Efficient Diversification of Web Search Results
In this paper we analyze the efficiency of various search results
diversification methods. While efficacy of diversification approaches has been
deeply investigated in the past, response time and scalability issues have been
rarely addressed. A unified framework for studying performance and feasibility
of result diversification solutions is thus proposed. First we define a new
methodology for detecting when, and how, query results need to be diversified.
To this purpose, we rely on the concept of "query refinement" to estimate the
probability of a query to be ambiguous. Then, relying on this novel ambiguity
detection method, we deploy and compare on a standard test set, three different
diversification methods: IASelect, xQuAD, and OptSelect. While the first two
are recent state-of-the-art proposals, the latter is an original algorithm
introduced in this paper. We evaluate both the efficiency and the effectiveness
of our approach against its competitors by using the standard TREC Web
diversification track testbed. Results shown that OptSelect is able to run two
orders of magnitude faster than the two other state-of-the-art approaches and
to obtain comparable figures in diversification effectiveness.Comment: VLDB201
The Impact of Novel Computing Architectures on Large-Scale Distributed Web Information Retrieval Systems
Web search engines are the most popular mean of interaction with the Web. Realizing a search engine which scales even to such issues presents many challenges. Fast crawling technology is needed to gather the Web documents. Indexing has to process hundreds of gigabytes of data efficiently. Queries have to be handled quickly, at a rate of thousands per second. As a solution, within a datacenter, services are built up from clusters of common homogeneous PCs.
However, Information Retrieval (IR) has to face issues raised by the growing amount of Web data, as well as the number of new users. In response to these issues, cost-effective specialized hardware is available nowadays. In our opinion, this hardware is ideal for migrating distributed IR systems to computer clusters comprising heterogeneous processors in order to respond their need of computing power. Toward this end, we introduce K-model, a computational model to properly evaluate algorithms designed for such hardware.
We study the impact of K-model rules on algorithm design. To evaluate the benefits of using K-model in evaluating algorithms, we compare the complexity of a solution built using our properly designed techniques, and the existing ones. Although in theory competitors are more efficient than us, empirically, K-model is able to prove because our solutions have been shown to be faster than the state-of-the-art implementations
QuickRank: a C++ Suite of Learning to Rank Algorithms
Ranking is a central task of many Information Retrieval (IR) problems, particularly challenging in the case of large-scale Web collections where it involves effectiveness requirements and effciency constraints that are not common to other ranking-based applications. This paper describes QuickRank, a C++ suite of effcient and effective Learning to Rank (LtR) algorithms that allows high-quality ranking functions to be devised from possibly huge training datasets. QuickRank is a project with a double goal: i) answering industrial need of Tiscali S.p.A. for a exible and scalable LtR solution for learning ranking models from huge training datasets; ii) providing the IR research community with a exible, extensible and effcient LtR framework to design LtR solutions and fairly compare the performance of different algorithms and ranking models. This paper presents our choices in designing QuickRank and report some preliminary use experiences.Ranking is a central task of many Information Retrieval (IR) problems, particularly challenging in the case of large-scale Web collections where it involves eectiveness requirements and eciency constraints that are not common to other ranking-based applications. This paper describes QuickRank, a C++ suite of ecient and eective Learning to Rank (LtR) algorithms that allows high-quality ranking functions to be devised from possibly huge training datasets. QuickRank is a project with a double goal: i) answering industrial need of Tiscali S.p.A. for a exible and scalable LtR solution for learning ranking models from huge training datasets; ii) providing the IR research community with a exible, extensible and ecient LtR framework to design LtR solutions and fairly compare the performance of dierent algorithms and ranking models. This paper presents our choices in designing QuickRank and report some preliminary use experiences
online convergent scheduling: un approccio per lo scheduling di job batch su griglia
progettazione e valutazione di un sistema per la gestione della fase di scheduling di uno stream di job con vincoli multipli non noto a priori su una griglia dedicata all'utility computing. La gestione e' effettuata con la tecnica del convergent scheduling, fornendo una priorita' indipendente a ogni possibile allocazione job-machina e utilizzando poi una procedura efficente ed ottimizzata per il matching final
A JOB SCHEDULING FRAMEWORK FOR LARGE COMPUTING FARMS
In this paper, we propose a new method, called Convergent Scheduling, for scheduling a continuous stream of batch
jobs on the machines of large-scale computing farms. This
method exploits a set of heuristics that guide the scheduler in making decisions. Each heuristics manages a specific
problem constraint, and contributes to carry out a value
that measures the degree of matching between a job and a
machine. Scheduling choices are taken to meet the QoS requested by the submitted jobs, and optimizing the usage of
hardware and software resources. We compared it with some
of the most common job scheduling algorithms, i.e. Back-
filling, and Earliest Deadline First. Convergent Scheduling
is able to compute good assignments, while being a simple
and modular algorith
Effective Data Access Patterns on Massively Parallel Processors
\ua9 2014 John Wiley & Sons, Inc. The new generation of microprocessors incorporates a huge number of cores on the same chip. Graphics processing units are an example of this kind of architectures. This chapter discusses the characteristics and the issues of the memory systems of this kind of architectures. It analyzes these architectures from a theoretical point of view using the K-model to estimate the complexity of a given algorithm defined on this computational model. The chapter describes how the K-model can be used to design efficient data access patterns for implementing efficient GPU algorithms. It introduces some preliminary details of many-core architectures, describes the K-model, analyzes the two applications, parallel prefix sum and bitonic sorting networks, by means of the K-model. Finally, the chapter concludes that experiments conducted demonstrates that the K-model could be fruitfully exploited to design efficient algorithms for computational platforms with many cores
Efficient Diversification of Search Results using Query Logs
We study the problem of diversifying search results by exploiting the knowledge mined from query logs. Our proposal exploits the presence of different “specializations ” of queries in query logs to detect the submission of ambiguous/faceted queries, and manage them by diversifying the search results returned in order to cover the different possible interpretations of the query. We present an original formulation of the results diversification problem in terms of an objective function to be maximized that admits the finding of an optimal solution in linear time
A Multilevel Scheduler for Batch Jobs on Grids
This paper proposes a two-level scheduler for dynamically scheduling a continuous stream of sequential and multi-threaded batch jobs on grids, made up of interconnected clusters of heterogeneous single-processor and/or symmetric multiprocessor machines. The scheduler aims to schedule arriving jobs respecting their computational and deadline requirements, and optimizing the hardware and software resource usage. At the top of the hierarchy a lightweight meta-scheduler (MS) classifies incoming jobs according to their requirements, and schedules them among the underlying resources balancing the workload. At cluster level a Flexible Backfilling algorithm carries out the job machine associations by exploiting dynamic information about the environment. Scheduling decisions at both levels are based on job priorities computed by using different sets of heuristics. The different proposals have been compared through simulations. Performance figures show the feasibility of our approach